Incorporating window-based passage-level evidence in document retrieval
نویسندگان
چکیده
This study investigated whether information retrieval can be improved if documents are divided into smaller subdocuments or passages, and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a certain size across the document. Each time the window stopped, it displayed/extracted a certain number of contiguous words. A retrieval score was calculated for each of the passages extracted, and the highest score obtained by a passage of that size was taken as the document’s “window score” for that window size. A range of window sizes were tried. The experimental results indicated that using a fixed window size of 50 gave better results than other window sizes for the TREC test collection. This window size yielded a significant retrieval improvement of 24% compared to using the wholedocument retrieval score. However, combining this window score and the wholedocument retrieval score did not yield a retrieval improvement. Identifying the highest window score for each document (using window sizes varying from 50 to 400 words), and adopting it as the document retrieval score yielded a retrieval improvement of about 5% over taking the size-50 window score. Different window sizes were found to work best for different queries. If we could predict accurately the best window size to use for each query, a maximum retrieval improvement of 42% could be obtained. However, an effective way has not been found for predicting which window size would give the best results for each query.
منابع مشابه
Passage-level Evidence for Cross-Language Information Retrieval
Machine translation (MT) techniques can be used to generate a query in a target language from a query in a source language for the cross-language information retrieval (CLIR). Recent MT systems have advanced enough to generate translations which are human-readable, However, translation error is still a serious impediment which hurts the effectiveness of a CLIR system. To compensate for defects ...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملIIT TREC 2006: Genomics Track
For the TREC-2006 Genomics Track, we report on the effectiveness of composite information retrieval functions based on a dimensional data model for improving document, passage, and aspect search precision of genomics literature. We designed an approach, and developed a corresponding search engine, based on a novel dimensional data model capable of document, paragraph, sentence, and passage leve...
متن کاملComparing Document Segmentation Strategies for Passage Retrieval in Question Answering
Information retrieval (IR) techniques are used in question answering (QA) to retrieve passages from large document collections which are relevant to answering given natural language questions. In this paper we investigate the impact of document segmentation approaches on the retrieval performance of the IR component in our Dutch QA system. In particular we compare segmentations into discourse-b...
متن کاملEnhancing Relevance Models with Adaptive Passage Retrieval
Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval in literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous collections. Previous research has shown that combining passage-level evidence with pseudo relevance feedback brings added benefits. In this pap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Information Science
دوره 27 شماره
صفحات -
تاریخ انتشار 2001